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This paper presents a prescription for distilling the information contained in the cosmic mi- 
crowave background radiation from multiple sky maps that also contain both instrument noise and 
foreground contaminants. The prescription is well-suited for cosmological parameter estimation and 
accounts for uncertainties in the cosmic microwave background extraction scheme. The technique 
is computationally viable at low resolution and may be considered a natural and significant gen- 
eralization of the "Internal Linear Combination" approach to foreground removal. An important 
potential application is the analysis of the multi-frequency temperature and polarization data from 
"qq ' the forthcoming Planck satellite. 

o 
o 

>V I. INTRODUCTION 

The detection and subsequent investigation of the cosmic microwave background (CMB) radiation over the past 
four decades has been essential to our current understanding of the universe. Initially providing evidence for an early 
hot dense radiation-dominated phase to the universe [3, [2], the CMB's spatial distribution is now studied in precise 
detail (see @, !, S S 0, S i] for the latest analysis of the WMAP satellite data by the WMAP science team) for clues 
both about the state of the universe at the beginning of the radiation era and the universe's composition, structure 
and subsequent evolution. As cosmologists we have been exceedingly fortunate that, from our vantage point of the 
earth, the galaxy (away from its plane) is both sufficiently transparent and sufficiently lacking in emission for us to 
be able to readily measure the intensity of the cosmic signal shining through. The cosmic signal is also moderately 
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polarized and this carries important additional cosmological information, in particular about the presence of gravity 
waves in the universe (which could be generated by say a high energy inflationary phase before the radiation era) 
and about the reionization of the universe. Unfortunately, with the signal being weaker and the polarization of the 
galactic emission being less understood than its intensity, both the detection and analysis of the polarization part of 
the signal are much more challenging. In addition, there is also extragalactic contamination, such as that from point 
sources. This paper presents a technique to extract the cosmological information out of multiple sky maps, including 
polarization ones, in the presence of both instrument noise and foregrounds. The technique rests upon the assumption 
that the foregrounds do not have the same frequency response as the CMB and requires that the sky maps probe 
1^ linearly independent parts of the spectrum of the sky signal. 

The prescription presented here relates to and builds upon a number of approaches already in the literature. The 
COBE team considered three approaches to separating galactic emission from the cosmic signal, one involving mod- 
elling known emissions using non-CMB data, another involving fitting the maps to functions of given spectral index, 
and a final one involving linearly combining their multifrequency maps to cancel the dominant galactic emission [lOj . 
The WMAP team primarily use a template subtraction procedure to mitigate the effects of foreground contaminants 
in power spectrum estimation, and study the foregrounds themselves via maximum entropy and most recently Markov 
Chain Monte Carlo parametric methods. However, initially for visualization purposes but later also for analysis they 
additionally introduced the "Internal Linear Combination" (ILC) scheme, forming a linear combination of their sky 
maps and choosing the weig hts to minimize the variance between the maps whilst being constrained to preserve unit 
response to the CMB [j, [Til [ijj ■ An harmonic mode- by- mode equivalent of ILC was presented in [l3| and applied to 
the WMAP data in [14j. An alternative harmonic-based generalization of the ILC technique was recently presented 
in [l5j]. The "Independent Component Analysis" (ICA) signal processing technique (see [Hj]), which attempts to use 
non-gaussianity of the one-pixel distribution to separate data into independent signals, has been tested for cosmolog- 
ical uses on the COBE, BEAST and WMAP data [13, H [H H3 ■ A possible weakness of ICA to CMB extraction 
is that the CMB is believed to be very close to gaussian and so can only emerge as what is left behind after the 
non-gaussian foregrounds have been removed. The related "Correlated Component Analysis" idea uses pixel-pixel 
cross correlations instead of non-quadratic single-pixel statistics to separate signals and has also been applied to the 
WMAP data [U • A modification of ICA that forces it to take into account the black body nature of the CMB 
(a key feature of the approach described here) was recently presented in (23|. The "spectral matching" approach 
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of [HLUE] shares many similarities to that presented here, and a recent paper [26[ presents an "additive component" - 
based separation technique. [13] suggests fitting model parameters at low resolution and then using these to solve 
for high resolution maps; see also the very recent work [28| for more on parametric component separation. A Gibbs 
sampling based approach to component separation and CMB power spectrum estimation was presented in [29j and 
applied to the WMAP data in [30] • The WMAP team also test this approach in jy]. 

Unlike many of the above papers, this work focusses on likelihood estimation for cosmological models as opposed to 
CMB sky map production. This requires a quantification of the uncertainties related to the CMB extraction (as also 
stressed by Ref. |3l|). Of course this is a somewhat ill-defined problem seeing as one does not know precisely what 
the foregrounds are. By putting relatively weak priors on the foregrounds though one might hope that such errors are 
being estimated if anything conservatively. Our prescription allows one to naturally incorporate non-CMB datasets 
and the information they contain about the foregrounds into the analysis. With the immiment launch of the Planck 
satellite [32j and the significant new information on the polarization of the CMB that it should deliver, this work is 
also notable in treating all Stokes parameters of the CMB in a unified manner. Numerical testing of the scheme and 
application to existing WMAP data are underway. 



II. DATA AND PRIORS 



Our starting point shall be a collection of n sky maps. Any that are usefully described in terms of a temperature 
will be assumed to be in thermodynamic temperature units. Note from the start that these maps do not all have 
to be "CMB" maps; other data sets (e.g. radio surveys, starlight polarization maps, point source maps, the WMAP 
"spurious signal" maps) can be included in the analysis in a unified manner and might be useful if they have physical 
correlations with contaminants in the CMB channels. 

We assume the sky map is discretized into elements, typically pixels or spherical harmonic coefficients up to some 
( ma , but perhaps say wavelet coefficients. For each element i we have the associated measurements for each sky map. 
We can stack these measurements into a vector X(i). These vectors can be further stacked into a big vector X (the 
entire data set). We'll assume we can estimate or calculate the inverse noise covariance matrix N _1 for X. 

Next, let us assume a linear relation between X and some underlying "signals" S, i.e. 

X = AS + M, (1) 

where M is the noise realization. Some of the signals will of course be the CMB, and the others will be the foregrounds. 
These "effective" foreground signals don't necessarily have to be thought of as physical processes, just as unwanted 
contaminants. We shall not assume that these foregrounds are uncorrelated with each other, but they will, however, 
almost by definition be taken to independent of the CMB. To extract cosmological information, we shall work towards 
a probability distribution for that part of S associated with the CMB sky. 

A key assumption shall be that the CMB is a blackbody. As a consequence, the "foreground" signals will implicitly 
be assumed to be linearly independent of the CMB in frequency space. 

We'll typically assume that the "mixing matrix" A is block-diagonal in pixel space, meaning that measurements in 
a given direction only depend on the signals in that direction (and we are assuming that any beam convolution effects 
have already been accounted for). In this case, we can write: 

X(i)=A(ii)S(i)+M{i). (2) 

(Here and onwards, a matrix followed by element indices denote the submatrix appropriate to the indices of the 
original matrix.) S(i) encodes the strength of the signals, and A(ii) controls how much each signal affects each 
instrument channel. The degeneracy between the amplitude of a column of A(zj) and the normalization of a given 
foreground signal is in principle handled by assumed priors, and the discrete symmetry of column interchange of A(m) 
leads to an uniform multiplicative overcounting that can be ignored. 

The blackbody nature of the CMB is realized as a requirement on the form of the A(m)'s. If S(i) T takes the form 
(S CMB (i) T , F(i) T ), with S CM b for the CMB component(s) and F for the foregrounds, then A(n) must take the form 



A(ii) = 



e 




(3) 



where e is a "tall" matrix whose width equals the number of components of S C mb(*) and whose elements are ones 
or zeroes (since temperature- type maps are assumed to be in thermodynamic temperature units). If a channel is 
polarization-sensitive, then its corresponding part of e will be a "small" identity matrix (with i denoting identity 
matrices of various sizes from now on) . If a channel is independent of the CMB its corresponding part of e will be a 
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"small" (row) matrix of zeroes (all "zero" matrices will be denoted 0). Such a channel is still useful in the analysis 
because it potentially responds to some of the foreground signals and thus carries information about them that can 
be used in extracting the CMB. The part of a marked ? encodes the spectral indices of the foreground signals. 

Because some physical foregrounds (e.g. galactic synchrotron radiation) are known to have a spectral index that 
varies across the sky, one might expect that our A(ii) should indeed be a function of the element i. However, an 
alternative procedure is possible if we have many frequency channels; we can take A(ii) — a for all i, and describe 
a single physical foreground with varying spectral index as two or more correlated effective signal components which 
have linearly independent corresponding columns of a. 

From now on we will take a to be a square n-by-n matrix, assuming there to be as many signal components as we 
have frequency channels. As we'll be marginalizing out the foreground signals, it seems reasonable to allow for as 
many of them as wc might need to describe the data on top of the noise. 

We are aiming towards using Bayes' Theorem to get a probability for a CMB model (or perhaps a CMB sky map) 
given the data, marginalized over foregrounds. Let us set down the relevant conditional probabilities that we shall 
need. Firstly, 

p(S|CMB model, f.g. model) = p(S CMB |CMB model).p(F|f.g. model) (4) 

by the independence assumption, where "f.g." is short for foreground. Here S CM b is the CMB signal (temperature and 
perhaps polarization) and F are the foreground signals, as introduced above. Typically the CMB model is describable 
in terms of the CMB power spectra, but in principle we could consider non-gaussian corrections too. In the gaussian 
case, 

p(S CMB |CMB model) = 1 e -sS l eC- 1 ScMe/2_ ^ 
Vl 27rC l 

For isotropic models in harmonic space, C is diagonal in I and conventionally expressed in terms of C;'s (i.e. 

|a^(if m ,| = Cf®5ij'5 m - m ' where P and Q denote Stokes parameter type.). 

Next, the probability of the data given the signals and the mixing matrix is given by the probability of obtaining 
the noise realization M = X — AS: 

p(X|S, A) = * e -(X-AS) T N- 1 (X-AS)/2^ (6) 

Here we have used maximum entropy to assign a gaussian distribution to the noise (see (33J) and have decided not to 
add in and marginalize over any non-gaussian corrections we could imagine (which would complicate the analysis) . In 
some cases we might wish to consider "inverse noise matrices" N _1 that are not actually the inverse of anything due 
to having had say monopole and dipole components, point sources, regions of strong galactic emission or instrumental 
effects projected out of them. (Such projections ensure that a model is not penalized if the data and model "disagree" 
along such directions.) In this case the determinant factor will not exist but seeing as it is independent of the model 
it can be safely ignored. Note that in many applications where the sky map data is actually derived with a maximum- 
likelihood approach from a timestream it is in fact the inverse noise covariance matrix and not the noise covariance 
matrix itself that naturally emerges. 

Now we need some priors. For the C;'s, we might employ a Jeffreys'-type prior I by Z, or alternatively assume 
they come from a known cosmological model which itself has priors on its parameters. In a middle-ground position 
we might incorporate a positive correlation between neighbouring C;'s, expecting them to be given by integrals over 
related transfer functions times the same underlying three-dimensional primordial power spectrum [341 ]. As simple 
test cases we could assume that the CMB is gaussian white noise on the sky with some unknown amplitude or that 
the CMB is scale-invariant with unknown amplitude. We shall see that the "white noise" assumption leads directly 
to the ILC approach as used by the WMAP team. 

We've already effectively imposed a prior that the mixing matrix A is block-diagonal in pixel space and have argued 
that it can be taken to be isotropic across the sky if we have enough sky maps to work with. (If we do relax the 
isotropy assumption on a then we should parametrize a slow variation of a across the sky with a suitable probability 
distribution over the parameters; see the recent work of [l5| for an application of this type of approach to ILC map 
making.) One might be concerned whether it is most natural to put a measure on a or on its inverse, since the 
former is perhaps more natural when thinking about the spectral response of physical foregrounds whereas the latter 
is heuristically at least what is needed to go from the data to the CMB. In fact we shall develop a measure that 
is invariant under inversion. A matrix measure naturally involves some power of the determinant of the matrix, 
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and from a _1 a = i one can see that 5a _1 = a Maa 1 and hence that <9(a _1 )/9(a) = | a - 1 1 2rl . 1 So a measure of 
|a| m <ia transforms into a measure |a _1 \~ m ~ 2n deT 1 for some power m. Full form symmetry is achieved if we choose 
m = —m — 2n, orm = —n. This choice yields the Haar measure on GL(n,i?), the group of symmetries of a real n- 
dimcnsional vector space, and so seems very natural. This measure is also scale- invariant as well as inversion-invariant. 
From this base we now consider our constraints on the mixing matrix from our taking the CMB to be black-body and 
to be independent from the foregrounds. Start from the matrix equation a _1 a = i. If we write a -1 as 

" fe) • < 7 » 

then we must have 

w T e = i, (8a) 

u T e = 0, (8b) 

where w and u are appropriate "tall" matrices. From this we can see that delta functions required in our measure on 
a to force the first columns of a to equal e correspond to the term |a _1 | _1 (5(w T e — i)<5(u T e) in a measure for a -1 . 
With the form of e as discussed above, Eqs. © render |a _1 | independent of w. 2 Hence the components w needed to 
extract the CMB only actually appear in the delta function 5(w T e — i) and not in any prefactor in a base measure 

— 1 3 

on a . 

The final prior is for the foregrounds. We shall take them to be gaussian, with auto- and cross- I by I correlations. 
The gaussian assumption is not ideal; indeed, the WMAP K- and KQ- sequence of masks are actually constructed by 
excluding pixels whose temperatures fall into the skewed high tail of the one-point temperature distribution function. 
However, by taking the inverse covariance to zero, a gaussian model does at least provide an analytically-controlled 
method of approaching a flat prior on the foregrounds. A flat prior might be particularly appropriate for polarization, 
given our limited physical understanding of galactic polarization, and so is what a maximum entropy argument would 
yield. 4 



III. PROBABILITIES FOR CMB POWER SPECTRA WITH NOISE 



We now move on to our main interest, the calculation of probabilities for cosmological models, expressed in terms 
of their predicted CMB power spectra, in presence of general (anisotropic and correlated) instrument noise and 
foreground contaminants. 

We shall take a limit towards a flat prior on the foregrounds, starting from a gaussian prior on them. As we shall 
see, this limit conveniently decouples the foreground-related parts of the inverse mixing matrix from the rest of the 
problem. We'll call the "grand" covariance matrix, of CMB and foregrounds combined, G, and under our stated 
assumptions this fully specifies our CMB and foreground models. 

We have: 

KXIa^G) = /dSKX,S|a-\G) (9) 
= y* ( iSj5(X|S,a- 1 ,G)p(S|a- 1 ,G) (10) 

„-(X-AS) T N _1 (X-AS)/2 

ds e - _= v^W^ 8 G 8/2 (ii) 

V^TrNf 



1 The latter is most easily seen by considering the two linear transformations on <5a that aggregate to give <5a -1 , i.e. bracketing the 
latter as — (a _1 <5a — 1 )a — 1 , using the chain rule for Jacobians and performing row and column interchange operations on each resulting 
n 2 -by-n 2 matrix. 

2 This can be seen by using Eqs. {(8} to substitute for the variables appearing in the first ?1cmb columns of a -1 , then appropriately adding 
in the other columns to make the first ones equal (i, 0) T in the determinant calculation. 

3 If we do have useful information on the foreground spectra we would presumably first encode this in a and then consider transforming 
to a -1 . If this information is actually coming from other non-CMB sky maps, it might perhaps be easier to perform a combined analysis 
including these maps as discussed above but with a larger a. 

4 As with for the mixing matrix, we could take any data sets that do give us extra information into account by directly including them 
in the analysis. 
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and to do the integral over S it is useful to change variables to U = AS, generating a |A| determinant factor. With 
the mixing matrix being uniform across the sky, A decomposes into a's down the diagonal and so |A| = la^ 0101 ". Note 
that the right-hand-side has been written so as to not explicitly depend on G, only on G _1 , to make the limiting 
process clearer. We can perform the gaussian integral over S and use the result in Bayes' theorem, in conjunction 
with our priors discussed in Sec. [TTJ to find: 



p(a-\G|X) ex p(X|a-\G)p(a-\G) (12) 

v /|CF T j ( 5( w T e _ i)(5(u T e)e +X T N- 1 (N- 1 +A- T G- 1 A- 1 )- 1 N- 1 X/2-X T N- 1 X/2 
« K G )| a _i|_iv e j enl la-l|n+l yiN- 1 + A _T G _1 A -1 ' ' ' ' " 



To do this integral we needed N _1 + A _T G _1 A _1 to be invertible. We also see that as long as the number of 
elements is much larger than the number of frequency channels, the |a _1 |" +1 term from our prior on a -1 becomes 
insignificant and could be ignored relative to the |a _1 | elem term from the likelihood. We keep both however, defining 
N = N elem - n - 1. 

It is convenient to take the flat-foreground-prior limit F _1 — > at this point. First note that |G _1 | = |C _1 ||F _1 | 
since CMB and foregrounds are assumed independent. By taking the limit in the same manner for the different CMB 
models that we are considering, we can safely ignore the vanishing |F _1 | factor (alternatively we can say that we are 
only interested in likelihood ratios between models, in which case the factor vanishes). 

Next, we consider A _T G _1 A _1 . Written out more explicitly, this takes the form 

a-TG-^Mja- 1 a- T G- 1 (l,2)a- 1 \ 

a- T G- 1 (2,l)a- 1 a^G" 1 ^, 2)a' 1 (14) 

where we have decomposed G _1 into element-labelled blocks, G _1 (ij) corresponding to the block of G _1 relating to 
elements i and j. As F _1 — > 0, a.~ T G (ij)aT 1 — ► wC _1 (ij)w T . So, introducing the nN e i em -by-n CMB N e i cm matrix 
Q: 

w • • 

I w I (15) 



we have 

-Tr*-l a -1 



QC X Q T . (16) 



Thus the probability factors into a term depending on w and C alone and a term depending on u alone (the 
<5(u T e)/|a _1 | _Ar part). When we integrate over a, or equivalently over w and u, to obtain the marginalized probability 
for the CMB model, the u integral thus factors off to give a multiplicative constant that can be ignored. 
Hence we obtain an effective probability for a CMB model of 

r +X T N" 1 (N" 1 +QC" 1 Q T )" 1 N" 1 X/2-X T N" 1 X/2 

p(C\X) cx / dw<5(w T e-i)- . (17) 

J v/ICHN- 1 + QC-!Q T | 

The integral over w will be performed using the saddle-point method developed in Appendix [Al 

Before integrating though, there may exist opportunities to significantly simplify (|17p . depending on the invertibility 
properties of the matrices remaining. If so, the size of subsequent matrix operations is substantially reduced, easing 
the computational burden. For the rest of this section we consider the simplest case of both N _1 and C _1 being 
invertible; more complicated cases are deferred to subsections below. Following the discussion in Appendix [Cj we are 
able to apply formulae (|C4[) and (|C3[) to simplify (fTT]) to 

r -X T Q(C+Q T NQ)- 1 Q T X/2 

p(C\X) cx p(C) / dw<5(w T e - i) . (18) 

J ^/\C + Q T NQ\ 

Note how the size of the matrix that needs to be inverted and have its determinant found has reduced from nN e \ eT[L - 
by-nN e \ em to n CMB -/V e i em -by-n MB-Welem) where n CMB is the number of CMB Stokes parameters under consideration, 
saving O((ri-/n MB) 3 ) in required computation. 
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Comparing with Eq. (|A1[) , we have 

S = i (In |C + Q T NQ| + X T Q(C + Q T NQ) _1 Q T X) , (19) 

and by varying with respect to w the appropriate first and second derivatives for S can be read off for use in (|A1 1|1 : 

S V S = tr5w T {N{ij)w(B-\ji)-n-\jm)X(m)X T (n)I)-\ni)) +X(i)X T {j)wT>- 1 (ji)} (20) 
Sl,S = tr5w T N(ij)5w(B~\ji)-T>- 1 {jm)X{m)X T {n)T>- 1 (ni))/2 

+ tr(5w T N(ij)wD" 1 (j/c) ( 5w T N(fci)w 

+ tr<5w T N(ij)wD" 1 (j/c)w T N(fci)5w 

+ tr Svf T N(ij)X(i)X T (j)wD" 1 (ji) /2 

- tr 8w T X(i)X T (j)wT>- 1 (jk) (<5w T N(fcm)w + w T N(mn)5w) D _1 (m) 
+ tr(5w T N(ij)wD"\j/i:) ( 5w T N(fcm)wD" 1 (TOn)w T X(n)X T (p)wD" 1 (^) 
+ tr(5w T N(ij)wD- 1 (jfc)w T N(fcTO)(5wD- 1 (TOn)w T X(n)X T (p)wD- 1 (^)/2 

+ tr(5w T D- 1 (ij)5wN(jfc)wD- 1 (fcTO)w T X(m)X T (n)wD- 1 (np)w T N(pi)/2 (21) 

where we have defined D to equal C + Q T NQ and assumed implied summation over repeated element indices. (To 
obtain full equivalence with Eq. (jAlip we need to map from the matrix w here to the vector x there. This can be 
done explicitly by "vectorization" of w if required, as discussed in Appendix [5J) The numerical calculation time is 
dominated by the required 0((n C MB-^elem) 3 ) inversion of D. So as long as the initial guess for w is reasonable 5 , one 
should be able to converge to the saddle-point expression for w in a few 0((n CM B^Veiem) 3 ) steps. Having converged to 
a saddle point solution w s . p . , we evaluate the exponent (|19|> there, calculate a prefactor 6 involving second derivatives 
of S at the saddle point as discussed in Appendix [X] and multiply by any prior in order to finally obtain the posterior 
probability for the CMB model in question. 

A. Noise Matrices with Projections 

Often one wishes to project additional degrees of freedom out of a formally invertible inverse noise matrix in 
order to render our posterior probabilities insensitive to certain complications with the data that would otherwise 
be unaccounted for. For example, one might project out monopole and dipole contributions of the maps from an 
experiment like WMAP, the monopole because the experiment is basically differential and the dipole because of the 
earth's motion relative to the CMB. Or one might project out regions of strong emission from the galaxy because one 
does not expect the relatively simple foreground model used to be accurate there. The WMAP team also project out 
a "transmission loss imbalance" mode from their maps. In some cases one can sometimes just reduce the dimensions 
of the matrices involved; e.g. one might just use the / > 1 modes in harmonic space to forget about the monopole and 
dipole, or for a galactic cut working in pixel space one might simply only consider data from pixels outside of the cut. 
However, when say both a monopole and dipole projection and a galactic cut are made, no such simple approach is 
possible, and one must proceed as follows. 

Each of the n pro j modes to be projected out are expressed as a column vector, and these column vectors are arranged 
into an nAf e ie m -by-n pro j matrix P. Then, N _1 is replaced by: 

INT 1 -> N~^"|p = 1NP 1 — N _1 P (P T N~ 1 P) 1 P T N _1 . (22) 

(Note that the normalization of the modes cancels out in the formation of N" 1 ^.) N _1 |p has been constructed so 
that when it is multiplied into any linear combination of the modes to be projected out it returns zero. Hence any 
discrepancy between the data and a potential signal along these modes is not penalized in the likelihood. 



5 A suitable starting place should be obtainable from the "noise-free" solution evaluated for a reasonable fiducial model, as derived in 
Sec. |IV] 

6 This prefactor coming from the integral over w is helping to encode the uncertainties from the foreground separation into the likelihood; 
this term would be missed in any approach that attempts to use a standard CMB likelihood formula applied to some best-fit linearly 
combined map and effective noise matrix. 
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Eq. (fT7|) becomes 

f +X T N- 1 |p(N- 1 |p+QCT 1 Q T )- 1 N- 1 |pX/2-X T N- 1 |pX/2 

p(C\X) cx / dw(5(w T e-i)- , , (23) 

J ■ /l C||(N-i| P + QC-iQT)| 



which, after some work involving (|C4[) and (|C3|) and keeping N to refer to the assumed-invertible formal noise matrix, 
reduces to: 



-X T Q (C+QTNQ)- 1 | qTp Q T X/2 

p(CIX) ocp(C) / rfwJ(w T e-i) - 6 P . (24) 

i y/\C + Q T NQ||P T Q(C + QTNQpQTp] 

We can now expand the delta function as a gaussian and integrate as above. Additional terms in the first and second 
derivates of the exponent come from the projection, further complicating the formulae, but the numerical time of the 
calculation is still dominated by the inversion of C + Q T NQ. 

There is some subtlety about which projections can be handled in this manner. For example, with temperature maps 
alone, one cannot independently project out all measurements associated with a given element; the second determinant 
under the square root turns out to be proportional to |ww T | Ar ° lcm and is thus singular. (This is happening perhaps 
because as the foregrounds are being marginalized out and the dimensions effectively reducing from nN e \ em to iV e i em 
the previously orthogonal projections might be becoming linearly dependent.) Potentially including polariation again, 
one can however project out any response to the CMB in the i'th element say with P(i) equal to e and all other 
entries in P zero. With the help of the delta function Q T P(z) becomes equal to the identity matrix i. Finally, one 
obtains just the result one would have gotten by forgetting about the element in question in the first place and starting 
with a reduced problem with only N e i enl — 1 elements rather than N e \ eul elements, as can be seen using rules for the 
determinants of blocked matrices and their inverses. By combining many such projections into a wide projection 
matrix one can form a mask, or by considering an appropriately weighted vector sum of such projections one can 
project out spatially coherent modes such as the monopole or dipole from pixellized data. 

B. The general case 

Finally we mention the case when the inverse noise matrix is completely general. Now, we cannot use the Woodbury 
formula to simplify the exponential in (|17p . Rather, we have to use the effective exponent from (|f 7j) as is and directly 
proceed with the replacement of the delta function with a gaussian. It is in fact very appealing to work like this 
directly with inverse noise matrices and inverse- noise- weighted maps, as these are what naturally emerge out of a 
maximum-likelihood analysis of timestream data. However, the practical cost is that one now has to invert full 
nN e i em -by-nN c \ cm matrices for each CMB model considered. 

IV. THE NO-NOISE LIMIT 

It is interesting to consider what happens if the noise is negligible. Starting from (fi"8|) . we can take the limit N — > 
to obtain: 

p(C|X) no noise oc f dw<Hw T e-i)-^L- s ', (25) 

with exponent S' = vec(w) T Rvec(w)/2 (with "vec" indicating vectorization as discussed in Appendix [Bl and using 
Eq. (|B5[) ) where we have defined 

R(C)=^C- 1 (ii)®X(i)X T (j). (26) 



7 By only using certain columns of e in P(i) one can in fact project out individual Stokes components of the CMB, paving the way for 
independent temperature and polarization masks. 
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Here eg) denotes the Kronecker product, discussed in Appendix [Bj Notice that S' is now quadratic in the components 
of w and hence the saddle point integration over w is exact. Re-expressing the w T e = i constraint as 



(i eg) e T )vec(w) = vec(i), 
Eq. (|A12[) gives the saddle point for vec(w) at: 

vec(w) s . p . = R -1 (i eg) e) ((i eg) e T )R _1 (i <g) e)) vec(i) 
("s.p." for saddle point). Using (|A13|) . the posterior for the CMB model is 

P(C) 1 -vcc(i) T ((i®e T )R- 1 (i(»e)) _1 vec(i)/2 



p(C|X) ex 



R 



(i«ie T )R- 1 (i(g)e) 



(27) 



(28) 



(29) 



Eq (|28p bears a marked resemblance to the formula for the weights in the ILC approach and this will be expanded 
upon in the following section. 



V. DERIVING SKY MAPS AND RELATIONS WITH THE ILC PROCEDURE 

So far in this paper we have concentrated on deriving likelihood functions for CMB models rather than on producing 
CMB sky maps. However, our procedure can of course be used to generate sky maps and associated noise covariances 
including contributions from foreground removal. 

Now p(S|X) = / dAdGp(S, A,G|X) cx / dAdG p(X.\S, A, G)p(S, A, G) using Bayes' Theorem. Using the same 
conditional probabilities and priors as above, again introducing U = AS, and using Eq. (|A8|) . we obtain: 

(S CMB «) « J dCp(C|X)w s T p .(C)X(i) (30) 

along with a somewhat messier expression for (S C mb(*)ScmbO')) using Eq. (|A9|) . In the no-noise limit, these expressions 
become exact and wj p (C) and p(C|X) are obtainable from Eqs. (|28|) and (|29|) respectively. 

To establish a direct link with ILC let us try to construct the CMB temperature (and possibly polarization) maps, 
Somb- Now, imagine we are working in pixel space in the no-noise limit and we have a prior on the CMB that it 
is sky-uncorrelated, i.e. it is gaussian white noise. Then C(i,j) — cdij say and R reduces to c _1 gx, defining 

x = J2i X(i)X T (i). Then R 1 is just c eg) x _1 using ()B3[) . Eq. (|28]) simplifies to become independent of c, and so, no 
matter what our prior on c actually is, we have 

<Sc MB (i)> =w s T p X(z) (31) 

with 

w s . p . = x _1 e (e T x _x e) -1 . (32) 

For temperature alone, in which case e is just (1, . . . , 1) T , this is exactly the ILC result. So Eqs. (|3Tj) and (|32|) are 
the natural generalization of ILC when treating temperature and polarization in a unified manner. Typically the ILC 
procedure is described as choosing the linear combination of channels that has minimum variance whilst still retaining 
unit response to the CMB. Our Bayesian derivation here (albeit with flat priors on the foregrounds and a very strong, 
and obviously incorrect, prior on the CMB) here gives an alternative, more insightful, perspective. For example, we 
see that one might not need to correct for "cosmic covariance" [l2l [3f| ; the ILC coefficients already give the mean 
maps. We might replace the "white noise CMB" prior with one based on a fiducial model up to an overall amplitude 
and thus derive "improved" ILC coefficients that correctly take into account spatial correlations in the CMB. A very 
simple case appropriate for low-^ would be to take the CMB to be scale-invariant, Ci cx 1/(1(1 + 1)), and work in 
harmonic space, calculating R with the appropriate weights. 

As mentioned below Eq. (|30|) . we can calculate a measure of the foreground- induced pixel-pixel covariance of our 
maps. For clarity we shall do this for temperature alone in the no-noise limit. Then S C mb(U is simply t(i), the 
temperature in the i'th pixel. With the "white noise" CMB prior, c is given by the single number AT 2 . With a 
Jeffreys' prior on AT 2 one finds 

HmtU))) = m 7™ ~ <*W> e) T ^S^(X(j) - (t(j)) e). (33) 

Acicm - n + 1 e x x x e 
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This is a very reasonable result; uncertainties are given by a quadratic form on differences away from a blackbody, 
with the quadratic form determined by the covariance of the sample. 

One can attempt generalizing the ILC procedure in other ways. For example, staying with temperature alone, 
one could keep the "white noise" CMB prior, but try to add in instrument noise at some level, taking it say to be 
sky-uncorrelated and isotropic, described by the same matrix n at each pixel. Then the joint likelihood for w and 
AT 2 , with a flat prior on AT 2 , turns out to be: 

jp(AT 2 ,w|X) oc lW 6 ' „ e'^^-*-, (34) 
VAT 2 + w T nw 

and the joint maximum likelihood point for AT 2 and w together occurs at the just same value for w, namely 
x _1 e/ (e T x _1 e), as in the noise-free case, with unchanged variance for w. This is at odds with the idea of subtracting 
the noise contribution n from the total variance x (which might seem plausible on the grounds of focussing on the 
signal) in deriving the ILC coefficients. 



VI. POSSIBLE EXTENSIONS, PRACTICAL CONSIDERATIONS AND CONCLUSION 

While the scheme presented above treats foregrounds very generally, there is scope for further extension. One 
possibility is to allow for a slow variation of the mixing matrix across the sky. In fact it would be technically easiest to 
perform a low-order spherical harmonic expansion of the inverse pixel-space mixing matrix; then w T (x)e = i would 
reduce to w^e = i and wj m e = for I > 0. Ref. [151 ] applies an analagous approach to map making, allowing the ILC 
coefficients to vary across the sky. Notice that this is distinct from the harmonic approach of [ljj, EH an d ^ s rather 
more physical when looked at from a component separation approach; the l-by-l analysis there effectively imagines the 
data from given direction to come from a convolution of signals around that direction, whereas [l5| and the suggestion 
here consider the data in a given direction to come from signals in that direction alone. 

Another natural variation would be to relax the flat prior on the foregrounds and rather try to marginalize over 
power spectra for the foregrounds. Now the separation achieved above between CMB and foregrounds would not 
occur and one would have to think about possible priors on the foreground part of the inverse mixing matrix (luckily 
we have seen that the base prior on a rapidly becomes irrelevant as the number of elements increases). The underlying 
model would be close to that used in the Gibbs sampling approach of [2{|, and in the context of map making one 
would be left with a scheme very similar to that of [24| (but with the advantages of working with a^ 1 rather than a). 
Now one of the distinguishing features of the foreground signals is that they are non-gaussian, and one could hope to 
exploit this if one could develop a plausible non-gaussian correlated probability distribution for the foregrounds. As 
mentioned earlier, one might also consider non-gaussian corrections to the CMB itself, but in this case at least we 
suspect them to be small. 

A conceptually simple change would be to relax the assumption that the mixing matrix a is square, if say one is 
confident in the number of physical foreground emission processes operating. However, many of the manipulations of 
our approach depend technically on the invertibility of a, so this change would be difficult to implement and would 
only be trustworthy if any neglected foreground components (e.g. coming from non-modelled spectral index variations) 
were below the level of the detector noise. 

Let us now move on to discuss certain practical considerations in implementing a scheme such as that described 
here. An obvious difficulty with a straightforward implementation is the usual one confronting CMB analysis, namely 
the need to numerically invert large matrices corresponding to the large size of the data sets involved. Although the 
cosmic signal is naturally band limited, the cutoff is too high for a full resolution analysis over a good fraction of 
the sky to be practical. 8 With simplifying assumptions however, some progress may be possible. 9 In any case, a full 
resolution analysis is probably not even necessary. Instead, one might perform a split analysis, treating large and 
small angular scales in a different manner (see [36[ for a discussion in the absence of foregrounds). While the large 



Of course one might be able to apply the scheme directly to data from high resolution ground based experiments that only focus on 
small patches of the sky. 

For example, one might work in harmonic space and assume that the detector noise, along with the signal, is diagonal in this basis. 
Then many of the matrix operations simplify. Still implementing a general cut using the projection technique, the computational burden 
might be reduced by the ratio of the cube of the number of modes projected out to the cube of the number of elements. A limited number 
of further corrections to the noise matrix might also be efficiently implementable using Woodbury formula techniques. Alternatively 
one might consider perturbative corrections to the noise model away from the (masked) isotropic case. 
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angular scales (where some of the most interesting new results might manifest themselves and where particularly B- 
mode polarization might have some signal relative to detector noise for Planck) might be dealt with accurately by an 
application of the method of this paper, an heuristic approximation to the method (or a different scheme altogether) 
might be all that is needed for high I. Of course the problem will not precisely factorize into a large scale part and a 
small scale part and so the coupling will have to be taken into account or shown to be negligible. In addition, there 
are issues with the production and interpretation of low resolution maps and their (inverse) noise matrices (see the 
description in j37| of the procedure used by the WMAP team), with difficulties due to aliasing and the finite size 
of the pixels themselves. In this work we have assumed that the signal has been deconvolved from the beam; the 
appropriate noise matrices must be obtained by transforming those appropriate to the common case of solving only 
for the beam-smoothed signal. Any beam uncertainties should be incorporated in the noise matrices also. 

Often the time-ordered data is discretized into a map on the sphere using some pixelization scheme as opposed 
to going directly to harmonic space. Even if all signals are strictly band-limited, with non-zero spherical harmonic 
coefficients only for I < Z max say, fully describing such signals in pixel space requires more than (Z max + 1) 2 pixels. One 
then has to be careful about the validity of certain matrix operations that need to be performed. For example, the 
inverse transformation from an unconstrained map in pixel space to harmonic space is not well-defined. Similarly, the 
pixel-space signal covariance matrix is non-invertible. The harmonic-space inverse noise matrix can be obtained by 
transforming the pixel-space inverse noise matrix though. An important issue arises if a galaxy cut projection is to be 
made: the mask itself needs be band-limited in order for pixel-space and harmonic-space approaches to be equivalent. 

While all of these complications need to be investigated and accounted for if necessary, it is not likely that they 
will lead to a fundamental problem in the scheme defined here for cosmic information extraction in the presence of 
foregrounds, just as they have not prevented standard cosmological analysis of CMB data. 

In conclusion, this paper has described a prescription for the analysis of multiple sky maps for cosmological in- 
formation in the presence of both detector noise and foregrounds. The prescription takes into account uncertainties 
in the foregrounds by modelling the foregrounds in a very general way and then marginalizing over the foregrounds. 
The noise can be very general, potentially correlated over the sky and even correlated between frequency channels, 
and include projections. Work is underway both to subject the prescription to extensive Monte Carlo testing and to 
apply it to the WMAP data. Assuming its performance is good the scheme should be ideal for the analysis of the 
temperature and polarization data from the forthcoming Planck satellite. 
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APPENDIX A: CONSTRAINED SADDLE POINT INTEGRALS 



This appendix explains the technique we use to evaluate constrained integrals in a saddle-point approximation. 
Imagine we are performing an n-dimensional integral over variables x l of some function e~ s ^ x \ with m < n linearly- 
independent linear constraints of the form c*x % = d k (summed over i) on the x's applied, i.e. 

I=[ d n x5(c i 1 x i -d 1 )... 5{c i m x i - d m )e~ S{x) . (Al) 



First, we approximate the delta functions by a narrow gaussian, described with some covariance matrix D which 
will eventually be taken to zero: 

S(C T X -d)= <5( C , V - d 1 ) . . . S(c, m X l - d" 1 ) — e -(c < m x < -«< m )0- 1 mn(c ( "x < -d")/2 ( A2) 

Next, we find the saddle point of the entire integrand and approximate the integrand as a gaussian around the saddle 
point. The saddle point is where the first derivative 

cD^^x-d^.+Sj (A3) 

of minus the exponent is zero, and the matrix M to tai of second derivatives of minus the exponent has components 

c J D- 1 c T |..+5 ij , (A4) 
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We need to evaluate both exponent and prefactor terms to obtain the full saddle-point approximation to (|A1[) . At 
the saddle point, it turns out that the "delta-function" part of the exponent tends to zero as D — > 0, as can be seen 
as follows. Multiplying (|A3|) by c on the left at the saddle point ("s.p." in formulae below) and rearranging yields 



and so the "delta-function" part is 



c T x-d=-D{c T cy 1 S^\ sp (A5) 



5j| ap (c^r^c)- 1 S,4 p /2. (A6) 



If there exists a sensible solution as D — > 0, then >5^| sp will only have a weak dependence on D and so in the limit 
the above term will vanish. 

The prefactor comes from the gaussian integral over deviations away from the saddle point, which is proportional 
to the reciprocal square root of M tota i evaluated at the saddle point. Splitting M tota i into the piece c.D -1 c T from the 
delta function and the piece M with components <S,-y| sp , we have | Mtotai | = |M||-D _1 ||.D + c TAf~ 1 c|. Combining 



with the \/|-D| term in the denominator of (|A2jl and taking the limit, we obtain an overall prefactor proportional to 
l/V|M||cM-ic T j. 
Hence we obtain: 

f pa — ■ ^ - p~ s (A7) 

y/\M/(2n)\ \2ttc^M-^c\ ' 

with all quantities evaluated at the saddle point. Note that the result depends only on the original integrand and its 
second derivatives but evaluated at the saddle point of the combined integrand. Additionally, 

d n xS(c T x - d) xe- s{x) r> x s . p .I, (A8) 

J d n xS(c T x - d) xx T e~ s ^ « (M- 1 - M- Y c {c^M^c) _1 c 1 M" 1 + z B . p .a£ p .) I. (A9) 

We still have to actually find the saddle point and typically this can be done numerically with a Newton-Raphson- 
type approach as follows. With the first and second derivatives from formulae (|A3j) and (|A4|) above, the Newton- 
Raphson update is of the form: 

Sx = -M-^ (cD-\c T x -d) + VS(x)) . (A10) 

With the saddle point of the limit being the limit of the saddle point we can simply start from a point on the constraint 
surface and take the limit D — straight away to obtain the projected update formula: 

Sprojx = - (m' 1 - M~ l c (c T M -1 c) _1 c T M -1 ^ VS(x) (All) 

("proj" for projected). Iterating this will lead us to the saddle point, which can then be used in (|A7[) to obtain the 
approximate value for the integral. 

In the special case that S is quadratic in x, the saddle point can be solved for analytically and is at x = 
M~ 1 c (D + c T M _1 c) _1 d. So taking D -> 0, we have 

x s .p. = M^c (c T M _1 c) _1 d (for S quadratic in x) (A12) 

and hence 

5 s . p . = d T (c T M _1 c) _1 d/2 (for S quadratic in x). (A13) 
Furthermore, Eqs. (|A"7|) . (|A"8j) and (jA"9j) become exact. 

APPENDIX B: VECTORIZATION AND KRONECKER PRODUCTS 

We here summarize vectorization of matrices and the Kronecker product of two matrices. See Appendix A of [38| 
for further identities and a recent discussion of vectorization in the context of CMB analysis. 
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The vectorization vec(^4) of an m-by-n matrix A is the mn-by-1 column vector formed by stacking the columns of 
A on top of each other. Explicitly, vec(A) T = (An, . . . , A ml ,A 12 , . . . , A m2 , . . . , A ln , . . . , A mn ). 

The Kronecker product of an m-by-n matrix A and a p-by-q matrix B is an mp-by-nq matrix, denoted A ® B, 
formed by appropriately stacking copies of the B matrix that have been multiplied by the elements of A: 



/ AuB A 12 B ••• A ln B\ 
A 21 B A 22 B ••• A 2n B 

A®B = 

\A m iB A m2 ■■■ A m bB J 
When the matrices are of compatible sizes such that the relevant products exist, 

(A <g> B)(C ®D) = (AC) <g> (BD). 

Hence, 

(A&B)- 1 = A' 1 

Also, 

(A ® B) T = A T < 

A useful identity involving both vectorization and the Kronecker product is 

tr (A T BCD) = vec(A) T (D T ® B)vec(C). 



(Bl) 



B~ 



B l 



(B2) 
(B3) 
(B4) 
(B5) 



APPENDIX C: SUMMARY OF MATRIX IDENTITIES USED 

This paper makes extensive use of Sherman-Morrison/ Woodbury- type formulae (see e.g. [39]) for matrix inverses 
and their determinants. Here we briefly show how these results may be derived in order to understand how and when 
they may be applied. We start by considering two related decompositions of the same blocked matrix: 



a ~u\ (l -uc\ (a + ucv T W 1 ()\ 
K v T c- 1 ) \0 1 )\ c- 1 ) \cv T 1 ) 

( 1 o\/a \/l -ar't-u 

War 1 1/ U c- l +i^a- l u) lo 1 , 



(CI) 
(C2) 



Taking the determinant of both decompositions and rearranging yields 

\a + ucv T \ = \a\ \c\ jcT 1 + w T a _1 M| . (C3) 
Inverting both decompositions and equating top-left corners yields 

(a + ucv T )~ = a^ 1 ~ a _1 it (cT 1 + i> T a _1 u) _1 i> T a _1 . (C4) 
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